-
Notifications
You must be signed in to change notification settings - Fork 266
Op4dTensorGeneric kernel upgrade #3458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Op4dTensorGeneric kernel upgrade #3458
Conversation
…pen into Op4dTensorGeneric_to_HIP
Tested again performance of these kernels but with other measurements, instead of time comparison I calculated useful calculations per second (GFLOPs) and bytes transferred to/from memory (GBs) Tested with tests generated in Comparison with old Op4dTensorGeneric
Comparison with Op4dTensorLite
Comparison with OpTensorFwdBias
Comparison with OpTensorLeadingOnes
|
MIOpen is moving to the new monorepo setup and all older unmerged PR's are being closed. Please re-open this as part of the new repo if these changes are still needed. |
This PR is for new, upgraded, Op4dTensorGeneric kernel, this is part of porting kernels from OCL to HIP
Below is performance (speed-up and drops in performance) comparison between new Op4dTensorGeneric kernel and other OpTensor kernels used for 4d tensors.
This PR is opened as draft for now, if everyone is ok with this new Op4dTensorGeneric kernel I will update this PR and replace old kernel with this new one.
Test cases generated and run from tensor_4d_generic_ocl_hip.cpp file, largest tensor is 128MB,
New Op4dTensorGeneric - Old OpTensorFwdBias (B - 1C11 case)
New Op4dTensorGeneric - Old OpTensorLeadingOnes (B - N111, NC11, NCH1, 1111)
New Op4dTensorGeneric - Old Op4dTensorLite (B - NCHW)
New Op4dTensorGeneric - Old Op4dTensorGeneric (B - all cases)